{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Stolen Szechuan Sauce - Data Upload.ipynb",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": [],
      "toc_visible": true,
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/google/timesketch/blob/master/notebooks/Stolen_Szechuan_Sauce_Data_Upload.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "myeER0OQAt1r"
      },
      "source": [
        "# The Stolen Szechuan Sauce\n",
        "\n",
        "This is a simple colab demonstrating one way of uploading data from the Stolen Szechuan Sauce challenge (found [here](https://dfirmadness.com/the-stolen-szechuan-sauce/)).\n",
        "\n",
        "This colab will not go into any analysis of the data, only uploading data to a sketch.\n",
        "\n",
        "A word of notice, this notebook can be run on the cloud runtimes, but then few changes need to be made. However it is assumed that you are connecting to a local runtime, see [instructions here](https://research.google.com/colaboratory/local-runtimes.html). This makes it easier to import data that is already on your system.\n",
        "\n",
        "For a more generic instructions of Colab can be [found here](https://colab.research.google.com/github/google/timesketch/blob/master/notebooks/colab-timesketch-demo.ipynb)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "40A429x4Ajfc",
        "cellView": "form"
      },
      "source": [
        "# @title Import libraries\n",
        "# @markdown This cell loads libraries that we will use througout the notebook.\n",
        "import io\n",
        "import os\n",
        "import codecs\n",
        "\n",
        "import altair as alt\n",
        "import numpy as np\n",
        "import pandas as pd\n",
        "\n",
        "from timesketch_api_client import config\n",
        "from timesketch_import_client import helper\n",
        "from timesketch_import_client import importer"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ITjKgNRtBBdx"
      },
      "source": [
        "## AutoRuns File\n",
        "\n",
        "Let's read the file that contains the output of the AutoRuns file."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "rxtTjR8RA8SB",
        "cellView": "both"
      },
      "source": [
        "# @markdown This needs to be changed to reflect the correct path.\n",
        "\n",
        "PATH_TO_FOLDER = '/mnt/chromeos/MyFiles/Downloads' # @param {type: \"string\"}\n",
        "# @markdown the path to the folder will be used for all subsequent paths\n",
        "# @markdown as a root folder.\n",
        "AUTO_RUN_FILENAME = 'autoruns-desktop-sdn1rpt.csv' # @param {type: \"string\"}\n",
        "\n",
        "PATH_TO_CSV = os.path.join(PATH_TO_FOLDER, AUTO_RUN_FILENAME)\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LFKpjpo66pve"
      },
      "source": [
        "Now we can read the content of the file:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "L05NWGvRBC0R"
      },
      "source": [
        "df = None\n",
        "with codecs.open(PATH_TO_CSV, 'r', encoding='utf-8', errors='replace') as fh:\n",
        "  df = pd.read_csv(fh, error_bad_lines=False)\n",
        "\n",
        "print(df.shape)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TORG4J8CHI4P"
      },
      "source": [
        "Quite a few errors, let's look at the data."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "lOcjM6ECFVkj"
      },
      "source": [
        "df.head(3)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-_NPKhANJSc-"
      },
      "source": [
        "This does not look right, let's look at the content of the file, let's look at the hex code (for that we will use the `!` which allows us to execute shell commands)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "t-wWKfrEJW50"
      },
      "source": [
        "!dd if=$PATH_TO_CSV bs=128 count=1 | xxd"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ERWnpjHNJflb"
      },
      "source": [
        "This file is not UTF-8, it's encoded as UTF-16, so let's now read the file in again, this time as UTF-16"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UPdW8UvlFvn1"
      },
      "source": [
        "df = None\n",
        "with codecs.open(PATH_TO_CSV, 'r', encoding='utf-16', errors='replace') as fh:\n",
        "  df = pd.read_csv(fh, error_bad_lines=False)\n",
        "\n",
        "print(df.shape)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8B4ew2-PJpj9"
      },
      "source": [
        "No errors, let's look at the content"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "3rs0U_wZH59M"
      },
      "source": [
        "df.head(3)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "8hxAw8QKJslL"
      },
      "source": [
        "This looks correct now, let's make the data a bit more Timesketch ready.\n",
        "\n",
        "The first thing is to create a datetime field that contains the timestamp. We will use the built-in conversion in pandas:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qtn7JXS-Iob-"
      },
      "source": [
        "df['datetime'] = pd.to_datetime(df['Time'])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7deDQiar682s"
      },
      "source": [
        "The next thing is to add few fields that Timesketch expects:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1sCIIv0YJ57-"
      },
      "source": [
        "df['data_type'] = 'autoruns:record'\n",
        "df['timestamp_desc'] = 'Entry Recorded'\n",
        "df['message'] = 'AutoRun: [' + df['Category'] + ' - ' + df['Profile'] + '] ' + df['Image Path']\n",
        "\n",
        "df.head(3)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "v9Hivygb7Axp"
      },
      "source": [
        "We can take a quick look at the data frame we just read in:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hQ7WkDPOM4mk"
      },
      "source": [
        "df.info()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cUUjqZWUKIna"
      },
      "source": [
        "### Upload To TS\n",
        "\n",
        "Let's upload this data to TS. For that we first need to get a copy of the Timesketch client, then we will need to get a copy of a sketch object."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "VZWk8dRWKDRr"
      },
      "source": [
        "ts_client = config.get_client()\n",
        "[(x.id, x.name) for x in ts_client.list_sketches()]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "0VEfTaLOBtZs"
      },
      "source": [
        "# @markdown This needs to be changed to reflect the correct sketch.\n",
        "\n",
        "SKETCH_ID = 6 # @param {type: \"integer\"}"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DFKwoCTi7NXm"
      },
      "source": [
        "Now we are ready to upload the data. The sketch that we want to use is the one with the ID of 6.\n",
        "\n",
        "We will use the importer client to import the data as a data frame, for that we need to setup an import streamer:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "riyNlxSXKh4m"
      },
      "source": [
        "sketch = ts_client.get_sketch(SKETCH_ID)\n",
        "import_helper = helper.ImportHelper() \n",
        "\n",
        "with importer.ImportStreamer() as streamer:\n",
        "  streamer.set_sketch(sketch)\n",
        "  streamer.set_config_helper(import_helper) \n",
        "\n",
        "  streamer.set_timeline_name('autoruns_desktop_sdn1rpt')\n",
        "\n",
        "  streamer.add_data_frame(df)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xdJJbJfL7YVq"
      },
      "source": [
        "What we did there was create a copy of the Import client, and then configured it (defining the sketch to use and what the name of the timeline we are going to choose)."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6AWrR8-8MSs0"
      },
      "source": [
        "Now this data has been uploaded to the sketch in TS but there is an error in TS import, that is if we go and visit the sketch we can see that the sketch hasn't been uploaded correctly, so let's copy the error here.\n",
        "\n",
        "Then we will delete/remove the timeline from the sketch so that there isn't an error one in TS.\n",
        "\n",
        "\n",
        "```\n",
        "Traceback (most recent call last):\n",
        "  File \"/usr/local/lib/python3.6/dist-packages/timesketch/lib/tasks.py\", line 558, in run_csv_jsonl\n",
        "    for event in read_and_validate(file_handle):\n",
        "  File \"/usr/local/lib/python3.6/dist-packages/timesketch/lib/utils.py\", line 225, in read_and_validate_jsonl\n",
        "    linedict['timestamp'] = parser.parse(linedict['datetime'])\n",
        "  File \"/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py\", line 1374, in parse\n",
        "    return DEFAULTPARSER.parse(timestr, **kwargs)\n",
        "  File \"/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py\", line 646, in parse\n",
        "    res, skipped_tokens = self._parse(timestr, **kwargs)\n",
        "  File \"/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py\", line 725, in _parse\n",
        "    l = _timelex.split(timestr)         # Splits the timestr into tokens\n",
        "  File \"/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py\", line 207, in split\n",
        "    return list(cls(s))\n",
        "  File \"/usr/local/lib/python3.6/dist-packages/dateutil/parser/_parser.py\", line 76, in __init__\n",
        "    '{itype}'.format(itype=instream.__class__.__name__))\n",
        "TypeError: Parser must be a string or character stream, not NoneType\n",
        "```\n",
        "\n",
        "This does indicate issues with datetime parsing. Let's take a closer look at the data frame in question. The first thing we check is to see whether there are any empty dates in the frame:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "srRmvAEKMs0S"
      },
      "source": [
        "df[df.datetime.isna()]"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "R-sFv9JmOj9V"
      },
      "source": [
        "The check we use is `isna` which checks to see if a field is empty or (Not a number).\n",
        "\n",
        "There are quite a few records with an empty date field. Let's exclude those. For that we will need to upload a slice of the data frame that doesn't contain any records with an empty date."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "s5U6sWVEOPDV"
      },
      "source": [
        "df.shape"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "zcm01gnJOoWB"
      },
      "source": [
        "df[~df.datetime.isna()].shape"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G5Xsyhl1OsHC"
      },
      "source": [
        "There seem to be 9 records without a date... let's remove them from the upload (by just uploading a slice of the data)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gdrtzphrOqwM"
      },
      "source": [
        "sketch = ts_client.get_sketch(SKETCH_ID)\n",
        "import_helper = helper.ImportHelper() \n",
        "\n",
        "with importer.ImportStreamer() as streamer:\n",
        "  streamer.set_sketch(sketch)\n",
        "  streamer.set_config_helper(import_helper) \n",
        "\n",
        "  streamer.set_timeline_name('autoruns_desktop_sdn1rpt_w_time')\n",
        "\n",
        "  streamer.add_data_frame(df[~df.datetime.isna()].copy())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UKmKE3hbgHaR"
      },
      "source": [
        "### Server AutoRun File\n",
        "\n",
        "Let's take the server Autoruns next"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vDFCbU41O1yx"
      },
      "source": [
        "DC_FILENAME = 'autorunsc-citadel-dc01.csv' # @param {type: \"string\"}\n",
        "\n",
        "dc_path = os.path.join(PATH_TO_FOLDER, DC_FILENAME)\n",
        "\n",
        "auto_server_df = None\n",
        "with codecs.open(dc_path, 'r', encoding='utf-16', errors='replace') as fh:\n",
        "  auto_server_df = pd.read_csv(fh, error_bad_lines=False)\n",
        "\n",
        "print(auto_server_df.shape)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "mTYjxb01grgw"
      },
      "source": [
        "Let's `groom` it for TS"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "-sxxS7CZgi3k"
      },
      "source": [
        "auto_server_df['datetime'] = pd.to_datetime(df['Time'])\n",
        "auto_server_df['data_type'] = 'autoruns:record'\n",
        "auto_server_df['timestamp_desc'] = 'Entry Recorded'\n",
        "auto_server_df['message'] = 'AutoRun: [' + auto_server_df['Category'] + ' - ' + auto_server_df['Profile'] + '] ' + auto_server_df['Image Path']\n",
        "\n",
        "auto_server_df.head(3)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "R8OhonO8g0ip"
      },
      "source": [
        "And upload (using the same method as before)"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SUeAQtEqgvfi"
      },
      "source": [
        "sketch = ts_client.get_sketch(SKETCH_ID)\n",
        "import_helper = helper.ImportHelper() \n",
        "\n",
        "with importer.ImportStreamer() as streamer:\n",
        "  streamer.set_sketch(sketch)\n",
        "  streamer.set_config_helper(import_helper) \n",
        "\n",
        "  streamer.set_timeline_name('autoruns_citadel_dc01_w_time')\n",
        "\n",
        "  streamer.add_data_frame(auto_server_df[~auto_server_df.datetime.isna()].copy())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Mh2SAJOYhBBU"
      },
      "source": [
        "Now we've got both autoruns in there\n",
        "\n",
        "## Plaso Files\n",
        "\n",
        "Let's in the plaso files, using:\n",
        "\n",
        "```shell\n",
        "$ timesketch_importer --sketch_id 6 20200918_0417_DESKTOP-SDN1RPT.plaso \n",
        "```\n",
        "\n",
        "or using the importer client in colab"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "EiotgxxjxMUE",
        "cellView": "form"
      },
      "source": [
        "DESKTOP_PATH = '20200918_0417_DESKTOP-SDN1RPT.plaso' #@param {type: \"string\"}\n",
        "SERVER_PATH = '20200918_0347_CDrive_new.plaso' #@param {type: \"string\"}\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Q4ePXTVZg97K"
      },
      "source": [
        "sketch = ts_client.get_sketch(SKETCH_ID)\n",
        "import_helper = helper.ImportHelper() \n",
        "\n",
        "with importer.ImportStreamer() as streamer:\n",
        "  streamer.set_sketch(sketch)\n",
        "  streamer.set_config_helper(import_helper) \n",
        "\n",
        "  streamer.set_timeline_name('desktop-sdn1rpt.plaso')\n",
        "  streamer.add_file(os.path.join(PATH_TO_FOLDER, DESKTOP_PATH))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "UZYiobwO0-uj"
      },
      "source": [
        "sketch = ts_client.get_sketch(SKETCH_ID)\n",
        "import_helper = helper.ImportHelper() \n",
        "\n",
        "with importer.ImportStreamer() as streamer:\n",
        "  streamer.set_sketch(sketch)\n",
        "  streamer.set_config_helper(import_helper) \n",
        "\n",
        "  streamer.set_timeline_name('dc1_plaso')\n",
        "  streamer.add_file(os.path.join(PATH_TO_FOLDER, SERVER_PATH))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7f_qM5Xd8HCL"
      },
      "source": [
        "## PCAP Files\n",
        "\n",
        "Another important factor in the challenge are the provided PCAP files. We need to get them checked into TS.\n",
        "\n",
        "Let's start parsing them. There are essentially two different methods of doing so:\n",
        "\n",
        "1. Using Wireshark to do the parsing and work with a CSV file\n",
        "2. Parse the PCAP file using python libraries and use that.\n",
        "\n",
        "Let's explore both options."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GfEUJ4z5NI6p"
      },
      "source": [
        "### Wireshark Route\n",
        "\n",
        "Wireshark has a neat feature to export a set of packages or all packets into various other formats. This also includes CSV. As Timesketch is able to handle CSV data, this is worth an attempt.\n",
        "\n",
        "To export packets to csv use:\n",
        "\n",
        "```Wireshark → File → Export Packet Dissections```\n",
        "\n",
        "And choose CSV.\n",
        "\n",
        "The exported CSV will include all displayed columns. One thing to note here is that the time by default is relative to the first packet in the capture. You need to adjust that. \n",
        "\n",
        "Go to:\n",
        "\n",
        "```Wireshark → View → Time Display Format```\n",
        "\n",
        "And select ```UTC Date and Time of the Day```\n",
        "\n",
        "To learn more about Time settings in Wireshark, visit wireshark.org\n",
        "\n",
        "The now exported CSV looks promising. Some things need to be adjusted like the datetime column name and the format, but we already know how to do that from the autoruns csv file.\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "MHApjgTmFZro",
        "cellView": "form"
      },
      "source": [
        "# @markdown Change the path to what fits on your system.\n",
        "PCAP_CSV_PATH = 'all_packets.csv' #@param {type: \"string\"}"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ruqzafD72LLH"
      },
      "source": [
        "pcap_df = pd.read_csv(\n",
        "    os.path.join(PATH_TO_FOLDER, PCAP_CSV_PATH),\n",
        "    encoding='utf-8', parse_dates=False)\n",
        "\n",
        "pcap_df.shape"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7xA4laexGYFl"
      },
      "source": [
        "#### Modify DataFrame\n",
        "\n",
        "Now let's rename fields and add other fields to make it work better for Timesketch."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "wz3y2J0hOE-y"
      },
      "source": [
        "pcap_df.head(3)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ylJwSl15OGaC"
      },
      "source": [
        "Now we've got a general idea about how the data looks like, so we can change it."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "tXNcE60jEEIx"
      },
      "source": [
        "# convert the 'Date' column to datetime format \n",
        "pcap_df['Time']= pd.to_datetime(pcap_df['Time']) \n",
        "pcap_df['data_type'] = 'pcap:wireshark:entry'\n",
        "pcap_df['timestamp_desc'] = 'Time Logged'\n",
        "pcap_df['source_short'] = 'LOG'\n",
        "pcap_df['source'] = 'Network'\n",
        "pcap_df['message'] = '[' + pcap_df['Protocol'] + '] ' + pcap_df['Info'] + ' (' + pcap_df['Source'] + ':' + pcap_df['src port'].astype('str') + ' -> ' + pcap_df['Destination'] + ':' + pcap_df['DST port'].astype('str') + ')'\n",
        "\n",
        "pcap_df = pcap_df.rename(columns={'Time': 'datetime'})\n",
        "\n",
        "pcap_df.info()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AShtCJv4PucM"
      },
      "source": [
        "Let's look at the data frame now"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "SsL4agghGcKT"
      },
      "source": [
        "pcap_df.head(3)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hpjRqMdlGfpb"
      },
      "source": [
        "Adjust ports"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ik9sjjIdGjLD"
      },
      "source": [
        "pcap_df['DST port'] = pcap_df['DST port'].astype(pd.Int32Dtype())\n",
        "pcap_df['src port'] = pcap_df['src port'].astype(pd.Int32Dtype())\n",
        "\n",
        "pcap_df.head(3)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "tITKx61LGn_I"
      },
      "source": [
        "#### Upload CSV\n",
        "\n",
        "Now we can upload the data to TS"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "x2dVlUyLHzzX"
      },
      "source": [
        "sketch = ts_client.get_sketch(SKETCH_ID)\n",
        "import_helper = helper.ImportHelper() \n",
        "\n",
        "with importer.ImportStreamer() as streamer:\n",
        "  streamer.set_sketch(sketch)\n",
        "  streamer.set_config_helper(import_helper) \n",
        "\n",
        "  streamer.set_timeline_name('wireshark_decoded_pcap')\n",
        "\n",
        "  streamer.add_data_frame(pcap_df[~pcap_df.datetime.isna()].copy())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cDCabjJFLOYE"
      },
      "source": [
        "### Using Python Libraries\n",
        "\n",
        "Now we can use python libraries, such as scapy. This is a much slower method than using Wireshark and a CSV. It is however more flexible, there are more things that can be done here.\n",
        "\n",
        "(for this we also have a progress bar since this will take some time to execute)\n",
        "\n",
        "Make sure that your environment has scapy installed, if not you can execute:"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "FzAVgN5PLR5O"
      },
      "source": [
        "!pip install -q scapy\n",
        "!pip install -q tqdm\n",
        "!pip install -q ipywidgets"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Cz0xDJm5LUxN"
      },
      "source": [
        "# @markdown Import needed libraries for using scapy.\n",
        "import binascii\n",
        "import datetime\n",
        "import pytz\n",
        "\n",
        "import tqdm\n",
        "from tqdm import tqdm_notebook, tnrange\n",
        "\n",
        "import ipywidgets as widgets\n",
        "\n",
        "from scapy import all as scapy_all"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "WCxbpYo5LX8G",
        "cellView": "form"
      },
      "source": [
        "# @markdown Change this to the correct path on your system.\n",
        "PCAP_PATH = 'case001.pcap' # @param {type: \"string\"}\n"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G-34CmPKUss8"
      },
      "source": [
        "Let's read in the PCAP file, word of warning, this will take a **really long time**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "gpL9tqlALZVe"
      },
      "source": [
        "packets = scapy_all.rdpcap(\n",
        "    os.path.join(PATH_TO_FOLDER, PCAP_PATH))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TG57s2ZOLbM1"
      },
      "source": [
        "# @markdown check how many packets are in there\n",
        "packets"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_tDnPoVYWBLK"
      },
      "source": [
        "Now we can start going through the packets to generate a data frame, since that's what we want so that we can upload data to TS\n",
        "\n",
        "To convert the data to a dataframe we are borrowing code from : https://github.com/secdevopsai/Packet-Analytics/blob/master/Packet-Analytics.ipynb (see the [medium post here](https://medium.com/hackervalleystudio/learning-packet-analysis-with-data-science-5356a3340d4e))"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "V9B7TMZJLdmS"
      },
      "source": [
        "# @markdown Collect field names from IP/TCP/UDP\n",
        "# @markdown *These will be columns in DF*\n",
        "ip_fields = [(field.name) for field in scapy_all.IP().fields_desc]\n",
        "tcp_fields = [(field.name) for field in scapy_all.TCP().fields_desc]\n",
        "udp_fields = [(field.name) for field in scapy_all.UDP().fields_desc]\n",
        "\n",
        "print(ip_fields)\n",
        "print(tcp_fields)\n",
        "print(udp_fields)\n",
        "\n",
        "ip_fields_new = [(\"ip_\"+field.name) for field in scapy_all.IP().fields_desc]\n",
        "tcp_fields_new = [(\"tcp_\"+field.name) for field in scapy_all.TCP().fields_desc]\n",
        "udp_fields_new = [(\"udp_\"+field.name) for field in scapy_all.UDP().fields_desc]\n",
        "\n",
        "dataframe_fields = ip_fields_new + ['time'] + tcp_fields_new + ['payload', 'datetime', 'raw']"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a06-AzxItGfC"
      },
      "source": [
        "#### Upload Data To Timesketch\n",
        "\n",
        "Now that we've got the columns sorted out, we can now move on to go through each of the packets, create a dict and upload that directly to Timesketch.\n",
        "\n",
        "Let's use the code from our previous example, except this time adding a progress bar. We are going to stream the results from the parsing directly to Timesketch.\n",
        "\n",
        "**Word of warning: this will also take considerable amount of time to execute and it may even crash your notebook. You have been warned!**"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "fqYR84a7Liks"
      },
      "source": [
        "sketch = ts_client.get_sketch(SKETCH_ID)\n",
        "import_helper = helper.ImportHelper() \n",
        "\n",
        "with importer.ImportStreamer() as streamer:\n",
        "  streamer.set_sketch(sketch)\n",
        "  streamer.set_config_helper(import_helper)\n",
        "\n",
        "  # Lower the threshold, which defines how many entries we go through before we flush the buffer.\n",
        "  streamer.set_entry_threshold(1000)\n",
        "  streamer.set_data_type('scapy:pcap:entry')\n",
        "  streamer.set_timestamp_description('PCAP Entry')\n",
        "  streamer.set_timeline_name('network_pcap_with_scapy')\n",
        "  streamer.set_message_format_string('{raw:s}')\n",
        "\n",
        "  for packet in tqdm_notebook(packets[scapy_all.IP]):\n",
        "    # Field array for each row of DataFrame\n",
        " \n",
        "    field_values = []\n",
        "    # Add all IP fields to dataframe\n",
        "    for field in ip_fields:\n",
        "      if field == 'options':\n",
        "        # Retrieving number of options defined in IP Header\n",
        "        field_values.append(len(packet[scapy_all.IP].fields[field]))\n",
        "      else:\n",
        "        field_values.append(packet[scapy_all.IP].fields[field])\n",
        "    \n",
        "    field_values.append(packet.time)\n",
        "    layer_type = type(packet[scapy_all.IP].payload)\n",
        "    for field in tcp_fields:\n",
        "      try:\n",
        "        if field == 'options':\n",
        "          field_values.append(len(packet[layer_type].fields[field]))\n",
        "        else:\n",
        "          field_values.append(packet[layer_type].fields[field])\n",
        "      except:\n",
        "        field_values.append(None)\n",
        "    \n",
        "    # Append payload\n",
        "    field_values.append(len(packet[layer_type].payload))\n",
        " \n",
        "    date_value = datetime.datetime.fromtimestamp(packet.time, tz=pytz.utc)\n",
        "    field_values.append(date_value.isoformat())\n",
        "    field_values.append(str(packet.show2))\n",
        "\n",
        "    # Create a dict and upload it to timesketch.\n",
        "    packet_dict = dict(zip(dataframe_fields, field_values))\n",
        "    ip_flags = packet_dict.get('ip_flags')\n",
        "    if not ip_flags is None:\n",
        "      packet_dict['ip_flags'] = ip_flags.names\n",
        "\n",
        "    tcp_flags = packet_dict.get('tcp_flags')\n",
        "    if not tcp_flags is None:\n",
        "      packet_dict['tcp_flags'] = tcp_flags.names\n",
        "\n",
        "    del packet_dict['time']\n",
        "\n",
        "    streamer.add_dict(packet_dict)"
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}
